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This paper is devoted to present the Mathematics Grammar Library, a system for multilingual math- 
ematical text processing. We explain the context in which it originated, its current design and func- 
tionality and the current development goals. We also present two prototype services and comment 
on possible future applications in the area of artificial mathematics assistants^ 



1 Introduction 

An archetypal meeting point for natural language processing and mathematics education is the realm of 
word problems |01[T4j[T5l, a realm in which mechanised mathematics assistants (MM A) are expected to 
play an ever more prominent role in the years to come. 

The following example, to which we will refer later on (last subsection of [3]), is meant to illustrate in 
a concrete way the idea of a word problem: 

A farm has ducks and rabbits. There are 100 animals and they have 260 legs. How many ducks and rabbits 
are there in the farm? 

We envision the Mathematics Grammar Library (MGL) presented in this paper as an enabling tech- 
nology for multilingual dialog systems capable of helping students in solving and learning how to solve 
word problems. This confidence is grounded on the MGL potential capabilities for dealing effectively 
with a mixture of text and mathematical expressions, capabilities that in turn depend crucially on the 
formal abstract way in which the semantics is captured. 

Since formal semantics is amenable to algorithmic processing, the library can manage, in addition 
to parsing and rendering natural language with mathematical expressions, powerful interactions with 
ancillary Computer Algebra Systems (CAS) or Computer Theorem Provers (CTP). As these are key 
ingredients for advanced MMAs, our working hypothesis is that MGL is a good basis on which to build 
useful MMA's for learning and teaching (cf. fT7l[T8Tl for some general clues on e-learning technologies). 

In this context, the current general aim for MGL is to provide natural language services for mathe- 
matical constructs at the level of high school and college freshmen linear algebra and calculus. At the 
present stage, the concrete goal is to provide rendering of simple mathematical exercises in multiple 
languages (see © for a demo of the expressions available and also the examples in Section [3]). 

For reading convenience, we include a short glossary of terms that will be used in the rest of the 
paper. 

GF Grammatical Framework: A programming language for multilingual grammar applications. Based 
on functional programming and type theory, the framework supports abstract grammars, which 
allow to capture meaning in a formal way, and concrete grammars, which enable multilingual 

* The research leading to these results has received funding from the European Union's Seventh Framework Programme 
(FP7/2007-2013) under grant agreement no. FP7-ICT-247914. 
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rendering. See ©[TO]]. The library MGL is programmed in GF, in a way that is comparable to 
how numerical libraries are compiled from C or Fortran sources. 

OPENMATH A de facto standard for mathematical semantics, and usually abbreviated as OM. It is "an 
extensible standard for representing the semantics of mathematical objects, allowing them to be 
exchanged between computer programs, stored in databases, or published on the worldwide web" 
(see [8]). It is structured in Content Dictionaries (CD's), each of which defines a collection of 
mathematical objects. 

SAGE Aimed at "creating a viable free open source alternative to Magma, Maple, Mathematica and 
Matlab", Sage is the result of an on-going collective endeavour led by William Stein. See ||71[T2| 
for a description of the system and its functionalities. 

WebALT European digital content for the global networks project (Contract Number EDC-22253). 
Developed in 2005 and 2006, WebALT aimed at using existing standards for representing math- 
ematics on the web and existing linguistic technologies to produce language-independent mathe- 
matical didactical material. See (HO. 

2 Background 

For a closer view of MGL, let us look briefly at its origins. The idea behind MGL was born, to a good 
extend, on reflecting about one of the key results of the WebALT project. In summary, the unfolding of 
this reflection went as follows. 

One of the aims of WebALT was to produce a proof-of-concept platform for the creation of a mul- 
tilingual repository of simple mathematical problems with guaranteed quality of the (machine) trans- 
lations, in both linguistic and mathematical terms. The languages envisioned were Catalan, English, 
Finnish, French, Italian and Spanish. Of these, Finnish, with its great complexities, could not be raised 
to the same level of functionality as the others. 

The WebALT prototype was successful and, as far as we know, that endeavour brought about the 
first application of the GF system for the multilingual translation of simple mathematical questions. The 
powerful GF scheme, based on the perfect interlocking of abstract and concrete grammars, was found 
to be a very sound choice, but the solution had several shortcomings that could not be addressed in that 
project. For the present purposes, the following three were the most appealing: 

• The grammars did not work for later versions of GF (>2.9). 

• The library was not modular with respect to semantic processing, and hence not easy to maintain. 

• It included too few languages, especially as seen from an European perspective. 

The springboard for the present library was the need to properly solve these problems, inasmuch as 
this was regarded as one of the most promising prerequisites for all further advanced developments in 
machine processing of mathematical texts. Thus the main tasks were: 

• To design a modular mathematics library structured according to the semantic standards (content 
dictionaries) of OpenMath. 

• To code it in the much more expressive GF 3. 1 for the few languages mentioned above, and 

• To write new code for a few additional languages (Bulgarian, Finnish, German, Romanian and 
Swedish). 
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The first two points amount to a tidying of the original WebALT programming methods. The third point 
represents not merely an addition of a few more languages, but a thourough testing of the methods and 
procedures enforced in the preceding steps. This testing is important in order to secure the rules for the 
inclusion of further languages and for a controlled uniform extension of the available grammars. 

To end this section, we include a few notions about the GF system that will ease the considerations 
about MGL in the next section. For a thourough reference about GF, see iTTOl . 

Any GF application begins by specifying its abstract syntax. This syntax contains declarations of 
categories (the GF name for types) and functions (the GF name for constructor signatures) and has to 
capture the semantic structure of the application domain. For example, to let Nat stand for the type of 
natural numbers and Prop for propositions about natural numbers, the GF syntax is 

cat Nat , Prop ; 

That 'zero is a natural number' and that 'the successor of any natural number is a natural number' 
can be expressed as follows: 

fun 

Zero : Nat ; 

Succ : Nat -> Nat ; 

The signatures for 'even number' and 'prime number' can be captured with 

fun 

Even, Prime : Nat -> Prop ; 
Finally, we can abstract the logical 'not', 'and' and 'or' as follows: 
fun 

Not : Prop -> Prop ; 

And, Or : Prop -> Prop -> Prop ; 

In practical terms, these declarations would form the body of an abstract module that would have the 
form 

abstract Arith = {<body>} 
where Arith is the name of the module. 

3 The MGL library 

As in any application coded in GF, we need to specify what categories will be used. In the case of MGL, 
the most relevant categories are in correspondence with all possible combinations of Variable and Value 
with the mathematical types Number, Set, Tensor and Function. Thus the category VarNum denotes a 
numeric variable like x, while ValSet denotes an actual set like "the domain of the natural logarithm". 
The distinction between variables and values allows us to type-check productions like lambda abstrac- 
tions that require a variable as the first argument. Variables can be promoted to values when needed. 

The library is organised in a matrix-like form, with an horizontal axis ranging over the targeted nat- 
ural languages. At the moment these are: Bulgarian, Catalan, English, Finnish, French, German, Italian, 
Polish, Romanian, Spanish, Swedish and Urdu. In addition, the mathematical typesetting system LTgX 
has also been included, and also a natural language interface to Sage that allows to elicit results from 
this sophisticated computational environment with commands expressed in any of the natural languages 
currently available. 

The vertical axis is for complexity and contains, from bottom to top, three layers: 
Ground. It deals with literals, indices and variables. 
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OpenMath. It is modelled after the OM Content Dictionaries (CD's), in the sense that in this layer 
there is an MGL module for each CD. 

Operations. This layer takes care of simple mathematical exercises. These appear in drilling materials 
and usually begin with directives such as 'Compute', 'Find', 'Prove', 'Give an example of, 

The following tree is an example of what can be expressed in the OpenMath layer: 

mkProp 
(lt_num 

(abs (plus (BaseValNum (Var2Num x) (Var2Num y)))) 

(plus (BaseValNum (abs (Var2Num x)) (abs (Var2Num y) ) ) ) ) 

When linearized, say with the Spanish concrete grammar, it yields 

El valor absoluto de la suma de x y de y es menor que la suma del valor absoluto de x y del valor 
absoluto de y 

Similarly, the tree 

DoSelectFromN 
(Var2Num y) 

(domain (inverse tanh)) 
(mkProp 
(gt_num 

(At cosh (Var2Num y)) 
pi)) 

gives, when linearized with the English concrete grammar: 

Select y from the domain of the inverse of the hyperbolic tangent such that the hyperbolic cosine of 
y is greater than pi. 

We end this section by describing two prototype services driven by MGL: the Mathbar demo and the 
gf sage service. 

Mathbar demo 

To access this demo, see Q. Now consider, for example, the sentence "Gamma is greater than pi raised 
to x", which can be easily composed by choosing Eng in the From slot and repeatedly choosing the 
desired word among the continuation options presented at each stage. If we further choose All in the To 
slot, we get the results shown in the screenshot. 

At the bottom, we can see the typesetting of LTrnX of the expression "\gamma > \pi" x": 

y > n x 

Remark. There are a few details in some of the concrete grammars that have to be improved. In the 
case of Polish, 'podniesiona' should be 'podniesione', because 'pi' is neutral in that language, and 'wiek- 
sza' should be 'wieksza' (Adam Slaski, private communication). There is also a slight inconsistency in 
the rendering of 'Gamma', since in French, Italian and Romanian it appears with 'g' while for all other 
it goes with 'G'. Actually it is not hard to modify the linearizations so that they produce '7r', 'y' and T' 
instead of 'pi', 'gamma' and 'Gamma'. 
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Math bar online 

From; Eng - To: All 





Gamma 


M 


greater 


than pi raised to x 












□ and 


or 


raised 







Abstract: mkProp (gt_nmnnumsl_gamma (power numsl_pi (Var2Num 
Bul : crania e no - ronaMa or nil npaEeHO joi 
AGamma e.s tries gran que pi elevat a x 
Gamma is greater than pi raised to x 
Gamma on suurempi kuin pii korotettuna x:aan 
gamma est phis grand que pi eleve a x 
. j^, Gamma ist groBer als Pi hoch x 
Ita : Jh, gamma e maggiore di pi elevato a x 

^.''■.gamma > \pi A x 

Gamma jest wieksza niz pi podniesiona do x 
gamma este mai mare dec at pi ridicat la x 
^.Gama es mayor que pi elevado a x 
Gamma ar storre an pi uppliojd till x 



*))) 



Cat: 
Eng: 
Fin: 
Fre: 
Ger: 
Ita: 

LaTeX: 
Pol: 
Ron: 
Spa: 
Swe: 
Urd: 




Remark. In the Mathbar demo there is the button "Try Google Translate". When we try for the differ- 
ent languages, there are cases in which we get the same result (Catalan, Romanian, Spanish, Swedish), 
but in others the result is different, and often wrong: 

Bul Taivia e no-rojiaMa, otkojikoto nn pefeHa flo x 

Fin Gamma on suurempi kuin PI nostetaan x 

Fre Gamma est superieure a Pi portee a x 

Ger Gamma groBer als pi um x erhoht 

Ita Gamma e superiore a pi elevato a x 

Pol Gamma jest wieksza niz pi podniesiony do x 

Urd £d ^ x P' °^ j ^ 



Sage commands in natural language 

Another recently developed prototype based on MGL is gf sage. It enables to express Sage commands 
in natural language and get the results expressed likewise. The tool starts a SAGE notebook server in the 
background (as described in Simple Sage Server API, [Ql), reads the pgf grammar file and translates the 
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queries from the chosen natural language to the concrete grammar for Sage. This is passed to the Sage 
server for evaluation. Each computation runs in a different worksheet cell and the server replies with a 
done or a computing message. In this case the program waits for completion of the computation and 
then writes the answer. 

From the GF side, what is send to SAGE is always in the category Command. What is returned by 
Sage is in the category Answer. There are 3 kinds of Commands: 

• Asking for a computation. Compute: Kind -> Value Kind -> Command. 

Sage gives back a ReturnBlock with the cell number and the answer (a string). We could now 
construct a short Answer by using: 

- Simple: k G Kind -> Value k -> Answer 
("it is 5"), or 

- Feedback: k G Kind -> Value k -> Value k -> Answer 

("the factorial of 3 is 6"), that combines the question (the first Value k) with the Sage answer. 

• Assuming propositions. Assume: Prop -> Command. 

Sage silently accepts the command by returning an EmptyBlock (with cell number) but we want 
it to be more assertive, so we reinject the Prop into Assumed: Prop -> Answer 
("I assume that x is greater than 2") 

• Binding Values to Variables. 

Assign: k G Kind -> Var k -> Value k -> Command 
("assign 2 to x"). 

We expect SAGE to return an EmptyBlock followed by 
Assigned: k G Kind -> Var k -> Value k -> Answer 
("2 is now assigned to x"). 

Here are some illustrations: 

sage> compute the sum of 1, 2, 3, 4 and 5. 
[4] 15 

answer: it is 15 

sage> compute the summation of x when x ranges from 1 to 100. 
[7] 5050 

answer: it is 5050 

sage> compute the integral of the cosine on the open interval 
from to the quotient of pi and 2. 

[8] 1 

answer: it is 1 

sage> compute the integral of the function mapping x 

to the square root of x on the closed interval from 1 to 2. 
waiting. . . 

[4] 4/3*sqrt(2) - 2/3 

answer: it is 4/3*sqrt(2) - 2/3 

sage> compute the sum of x and y. 
[4] x + y 

answer: it is x plus y. 
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sage> compute the sum of x and 5 . 
[5] x + 5 

answer: it is x plus 5. 

sage> compute the sum of 4 and 5 . 
[6] 9 

answer: it is 9. 



Dealing with word problems 

Let us return to our word problem example in Section [T] in order to consider the difficulties posed by a 
full computer representation of its more relevant aspects, and also to point out some hints about how to 
achieve it. First of all, there is the question that human readers are expected to make sense of information 
that is not stated explicitely but which they usually infer from the semantic context. In our example, it 
is enough to write the inferred assertions next to the assertions given in the word problem. Notice that 
some of the inferences amount to making explicit the implicit references. 

A farm has ducks and rabbits. 1. A farm has no animals other than 

ducks and rabbits. 

There are 100 animals 2. There are 100 animals in the farm. 

and they have 260 legs. 3. The animals in the farm have 260 legs. 

How many ducks and rabbits are there in the farm? 4. How many ducks are there in the farm? 

How many rabbits are there in the farm? 

Let us proceed now with a few hints about how the right-hand side statements in the table could be 
elicited from the left-hand side ones. 

1. The line can be parsed except for farm, ducks and rabbits, which are unknown to MGL. It can 
be inferred, however (using the structure available from the GF parser), that these unknowns are 
common nouns. Then a query to Wordnet [16 ] finds entries compatible with this assumption. From 
the determinants used, we deduce that there is an instance / of the entity FARM (F) and that there 
are entities DUCK and RABBIT (D and R, respectively). The verb has is a priori related to the IN 
predicate^ 

f e f, |oniN(/)| > l,* |/?niN(/)| > i,* AniN(/)\(Du/?) =0. 

2. Animals is a new common noun leading to a new entity A. Another query to Wordnet reveals that 
it is, in fact, an hypernym of duck and rabbit: 

|AniN(/)| = 100, D,RCA. 

3. The noun legs gives rise to another entity (L) and the occurrence of have introduces a new version 
of IN: 

|LniN(AniN(/))| =260. 

4. Wordnet points out that a farm is a location, so there probably refers to /. A how many question 
asks for d = \D n in(/) | and r = \R D IN(/) | . 



* It is not hard to have instances of the problem whose solution has no rabbits (or no ducks). This may come as a surprise 
to the student, but it is mathematically acceptable by common practice. If we were to follow this convention, then we would 
drop these inequalities. 
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4 Conclusions and further work 

In this paper we have described a GF library, which we call MGL, for multilingual mathematical text 
processing. We have also indicated how it originated in the WebALT project, its relation to GF, and 
its present functionality. After a first step in which the main concern was tidying and modularizing the 
WebALT prototype for simple mathematics exercises in five languages, we have extended it, in a second 
step, with four more languages (Finnish was considered in the first step, but it had to be worked out from 
scratch in the second step). We have also showed that LTfjX and Sage can be approached with the same 
methodology. In particular, gf sage allows to interact with Sage by expressing the commands in natural 
language. 

Further work has three main lines: 

• Addition of new languages, like Danish, Dutch, Norwegian, Portuguese, Russian, . . . This is a 
continuation of the first two steps referred to above and our assessment is that it can be done 
reliably with the methods and procedures established so far. To some extent, the library modules 
for a new language can be generated automatically up to a point in which the remaining work 
corresponds to natives in that language. 

• Describing a systematic procedure for the uniform and reliable extension of the grammars accord- 
ing to new semantic needs. This is an important step that is being researched from several angles. 
One important point is to ascertain when a piece of mathematical text requires functionalities (cat- 
egories, constructors, operations) not yet covered by MGL. 

• Advancing in the use of MGL for the production of ever more sophisticated artificial mathematics 
assistants. This is also the focus of current research that includes a collaboration with statistical 
machine translation methods, as in principle they can suggest grammatical structures out of a 
corpus of mathematical sentences. One important element will be an extended version of gf sage 
that will enable to harness a powerful CAS system such as Sage by means of commands expressed 
in natural languages. We also envision a similar prototype to harness the capabilities of CTPs. 
After this, we hope that we will be in a position to produce a MM A that can help students in 
solving and learning how to solve word problems of the kind we have been considering. 

How to get MGL 

The living end of the library is publicly available using subversion as: 

svn co svn: //molto-project . eu/mgl 
A stable version can be found at: 

svn co svn: //molto-proj ect . eu/tags/D6 . 1 
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